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ABSTRACT 

Nowadays, to facilitate the communication and cooperation 
among employees, a new family of online social networks has 
been adopted in many companies, which are called the “en¬ 
terprise social networks” (ESNs). ESNs can provide employ¬ 
ees with various professional services to help them deal with 
daily work issues. Meanwhile, employees in companies are 
usually organized into different hierarchies according to the 
relative ranks of their positions. The company internal man¬ 
agement structure can be outlined with the organizational 
chart visually, which is normally confidential to the public 
out of the privacy and security concerns. In this paper, we 
want to study the IOC (Inference of Organizational Chart) 
problem to identify company internal organizational chart 
based on the heterogeneous online ESN launched in it. IOC 
is very challenging to address as, to guarantee smooth oper¬ 
ations, the internal organizational charts of companies need 
to meet certain structural requirements (about its depth and 
width). To solve the IOC problem, a novel unsupervised 
method Create (ChArT REcovEr) is proposed in this pa¬ 
per, which consists of 3 steps: (1) social stratification of 
ESN users into different social classes, (2) supervision link 
inference from managers to subordinates, and (3) consecu¬ 
tive social classes matching to prune the redundant supervi¬ 
sion links. Extensive experiments conducted on real-world 
online ESN dataset demonstrate that Create can perform 
very well in addressing the IOC problem. 
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Figure 1: An example of organizational chart infer¬ 
ence from online ESN. 

1. INTRODUCTION 

In social sciences, people in social organizations (e.g., a 
country or a company) can be categorized into different 
rankings of socioeconomic tiers based on factors like wealth, 
income, social status, occupation, power, etc. In this pa¬ 
per, we will take “company” as an example and the inter¬ 
nal hierarchical structure of employees in a company can be 
outlined with company organizational chart formally. Most 
company organizational charts are usually tree-structure di¬ 
agrams with CEO at the root. Executive Vice Presidents 
(EVPs) at the second level and so forth. Company orga¬ 
nizational chart shows the company internal management 
structure as well as the relationships and relative ranks of 
employees with different positions/jobs, which is a common 
visual depiction of how a company is organized. 

Nowadays, to facilitate the collaboration and communica¬ 
tion among employees, a new type of online social networks 
named enterprise social networks (ESNs) has been adopted 
inside the firewalls of many corporations. A representative 
example of online ESNs is YammeiQ Over 500,000 busi¬ 
nesses around the world are now using Yammer, including 
85% of the Eortune 50C[^ Yammer provides employees with 
various enterprise social network services to help them deal 
with daily work issues and contains abundant heterogeneous 
information generated by employees’ online social activities. 
Problem Studied: Company internal organizational chart 
is usually confidential to the public for the privacy and se¬ 
curity reasons. In this paper, we want to infer the organi- 

^ https:// WWW. yammer, com/ 

^ https: / / about .yammer.com / why-yammer / 













zational chart of a company based on the heterogeneous in¬ 
formation in online ESNs launched in the company, and the 
problem is formally named as the Inference of Organization 
Chart (IOC) problem. 

To help illustrate the IOC problem more clearly, we also 
give an example in Figure where the left plot is about an 
online ESN adopted in a company and the right plot shows 
the company’s organizational chart. In the ESN, users can 
have different types of social activities, e.g., follow other 
users, join groups, write/reply/like posts, etc. Meanwhile, 
in the organizational chart, employees are connected by su¬ 
pervision links from managers to subordinates, who are or¬ 
ganized into a rooted tree of depth 2 with CEO “Adam” 
at the root. In companies, managers can usually supervise 
several subordinates simultaneously, while each subordinate 
only reports to one single manager. For instance, in Fig¬ 
ure CEO “Adam” manages “Bob” and “Candy” concur¬ 
rently, while “David” only needs to report to “Bob”. 

The IOC problem is an interesting yet important prob¬ 
lem. Besides inferring company organizational chart, it can 
also be applied in other real-world concrete applications: (1) 
identifying the command structures of terrorist organiza¬ 
tions based on the communication/traffic networks of 
their members. The command structures of terrorist orga¬ 
nizations are usually pyramid diagrams outlining their sup¬ 
port systems consisting of the leaders^ operational cadre, 
active supporters and passive supporters [^. Uncovering 
their internal operational structure and determining roles 
of members will be helpful for conducting precise strikes 
against their key leaders and avoid the tragic events, like 
9/11 [^. (2) inferring the social hierarchies of animals 

based on their observed interaction networks [M . Many an¬ 
imals (like, mammals, birds and insect speciSj are usually 
organized into dominance hierarchies. Identifying and un¬ 
derstanding the organizational hierarchies of animals will be 
helpful to design and carry out effective conservation mea¬ 
sures to protect them. 

Albeit its importance, IOC is a novel problem and we 
are the first to propose to study it based on online ESNs. 
The IOC problem is totally different from existing works: 
(1) ^^hierarchy detection in social networks^^ [^, which only 
studies the division of the regular users of the social networks 
into different hierarchies, who are not actually involved in 
any organizations; (2) ^^organizational intrusion’’^ [^, which 
focuses on attacking organizations and attaining company 
internal information only; and (3) ^Hnferring offline hierar¬ 
chy from social networks’’^ , which merely infers fragments 
of offline hierarchical ties in homogeneous networks, instead 
of reconstructing the whole organizational chart. Different 
from all these works, in this paper, we aim at recovering the 
complete organizational chart of a company (including both 
the hierarchical tiers of employees and the supervision links 
from managers to subordinates) based on the heterogeneous 
information about the employees in online ESNs. 

Meanwhile, to guarantee the smooth operations of compa¬ 
nies, the inferred organizational chart needs to meet certain 
structural requirements I^, including both (1) macro-level 
depth requirement, and ^) micro-level width requirement. 
Two classical organizational structures adopted by compa¬ 
nies are the vertical structure and the horizontal structure 
M . Vertical organizational structure with well-defined chains 
of command clearly outlines the responsibilities of each em¬ 
ployee but will result in delays in information delivery [^. 



Figure 2: Examples of organizational charts. 


Meanwhile, horizontal organizational structure with flat com¬ 
mand system involves everyone in decision making but will 
lead to difficulties in coordinating the activities of different 
departments [^. For instance, based on the input social net¬ 
work shown in plot A of Figure we give two extreme cases 
of the vertical and horizontal organizational charts without 
depth regulation in plots B and C respectively, both of which 
will lead to serious management problems for large compa¬ 
nies involving tens of thousands employees. Proper regu¬ 
lation of the inferred organizational chart’s depth (i.e., the 
macro-level depth requirement) is generally desired. On the 
other hand, most employees in companies need good super¬ 
visors to coach and instruct their daily work, but the number 
of subordinates each manager can supervise is limited, which 
can be determined by their management capacities, available 
time and energy. Rationally regulating the allocation of su¬ 
pervision workload among managers (i.e., the micro-level 
width requirement) can improve the management effective¬ 
ness significantly. For instance, in plot D of Figure we 
show an inferred organizational chart with depth regulation 
but no subordinate allocation regulation. In the plot, users 
in ESN are stratified into 3 tiers (which is relatively reason¬ 
able compared to the extreme cases in plots B and C) but 
the employees’ management workloads at tier 3 are all as¬ 
signed to one single manager, which may be beyond his/her 
management ability. 

Despite its importance and novelty, the IOC problem is 
very hard to solve due to the following challenges: 

• regulated social stratification: Effective social strati¬ 
fication to partition users into different hierarchical 
arrangements (i.e., identifying the relative manager- 
subordinate roles of employees) while meeting the macro¬ 
level depth requirement is the prerequisite for address¬ 
ing the IOC problem. 

• supervision link inference: Supervision link is a new 
type of link merely existing from managers to their 
subordinates. Predicting the existence of potential su¬ 
pervision links with the heterogeneous information in 
ESNs is still an open problem. 

• regulated supervision workload allocation: To maxi¬ 
mize the management effectiveness and efficiency, the 
number of subordinates each manager can supervise 
is limited by the management threshold K. In other 
words, supervision links in organizational chart have 
an inherent K-to-one constraint. 

To address all the above challenges, a new unsupervised 
organizational chart inference framework named Create 






(ChArT RE covEr) is proposed in this paper. Several new 
concepts (e.g., class transcendence social links^\ Matthew 
Effect based constraints^ and ^Ehart depth regulation con¬ 
straints) introduced and Create resolves the reg¬ 

ulated social stratification challenge by minimizing the ex¬ 
istence of class transcendence social links in ESNs. Cre¬ 
ate tackles the supervision link inference challenge by ag¬ 
gregating multiple social meta paths in the ESN between to 
consecutive social hierarchies. Einally, Create handles the 
regulated supervision workload allocation challenge by apply¬ 
ing network flow to match consecutive social hierarchies to 
preserve the K-to-one constraint on supervision links. 

The remaining parts of the paper are organized as follows. 
In Section we will define some important terminologies 
and the IOC problem. Method Create will be introduced 
in Section Extensive experiment results are available in 
SectionEinally, we describe the related works in Section 
and conclude this paper in Section 

2. PROBLEM FORMULATION 

In this section, we will introduce the formal definitions 
of ^^heterogeneous social networkss and ^^organizational chartss 
at first and then define the IOC problem with these two 
concepts. 

2.1 Terminology Definition 

Definition 1 (Heterogeneous Social Networks): A hetero¬ 
geneous social network can be represented as C = (V, ^), 
where V = \Ji^i and 8 = Uz are the sets of different 
types of nodes and complex links among these nodes in the 
network respectively. 

As introduced in Section users in online ESNs (e.g.. 
Yammer) have various types of social activities, e.g., follow 
other users, join groups, write/reply/like online posts, etc. 
As a result. Yammer can be represented as a heterogeneous 
social network G — (V, ^), where V — lA yj Q yjV is the set 
of user, group and post nodes in G and ^ = <SUJ"U>VU 
IZUJC denotes the set of social links among users, join links 
between users and groups, as well as write, reply and like 
links between users and posts respectively. 

Definition 2 (Organizational Chart): The organization chart 
of a company can be represented as a rooted tree T = 
(A/", £, root), where JV is the set of employees, C denotes the 
set of directed supervision links from managers to subordi¬ 
nates in T and root represents the CEO in the company. 


2.2 Problem Definition 

Based on the definitions of heterogeneous social network 
and organizational chart, we can define the IOC problem 
formally as follows: 

Definition 3 (Organizational Chart Inference (IOC)): Given 
an online ESN G = (V, 8) launched in a company, the 
IOC problem aims at inferring the most likely organizational 
chart T = {Af, C, root) of the company, where J\f = U iU is 
the user set in network G). Eurthermore, considering that 
the node set as well as the root node in T are fixed, the 
IOC problem actually aims at inferring the \JV — 1| most 
likely supervision links C among employees. The inferred 
supervision links together with the node set M as well as 
the root node can recover the original organizational chart 
T = {J\f, C, root) of the company. 


3. PROPOSED METHODS 

Considering that supervision links exist merely between 
managers and subordinates, we propose to stratify users 
in enterprise social networks into different social classes to 
identify their relative manager-subordinate roles in Subsec- 
tion |3.1| Macro-level depth requirement of the inferred chart 
is achieved with the depth regulation constraint in the social 
stratification objective function. Potential supervision links 
can be inferred between employees in consecutive classes by 
aggregating social meta paths among employees in the ESN 
in Subsection |3.2| To preserve the K-to-one constraint on 
supervision links (i.e., the micro-level width requirement), 
redundant non-existing supervision links will be pruned in 
Subsection |3.3| Generally, as shown in Eigurej^ framework 
Create has three steps: (1) regulated social stratification, 
(2) supervision link inference, and (3) regulated social class 
matching, which will all be introduced in this section. 

3.1 Regulated Social Stratification 

Supervision links merely exist between managers and sub¬ 
ordinates. Division of users into hierarchies to identify their 
relative manager-subordinate roles can shrink the supervi¬ 
sion link inference space greatly. The process of hierarchiz¬ 
ing users in online ESN is called social stratification formally. 
Definition 4 (Social Stratification): Traditional social strat¬ 
ification concept used in social science denotes the ranking 
and partition of people into different hierarchies based on 
various factors, e.g., power, wealth, knowledge and impor¬ 
tance [^. In this paper, we define social stratification as the 
partition process of users in online ESNs into different hier¬ 
archies according to their management relationships, where 
managers are at upper levels, while subordinates are at lower 
levels. 

The relative stratified levels of users in online ESN are 
defined as their social classes. 

Definition 5 (Social Class): Social class is a term used by 
social stratification models in social science and the most 
common ones are the upper, middle, and lower classes p^ . 
In this paper, we define social class of users in online ESNs as 
their management level in the company, where CEO belongs 
to social class 1, EVPs belong to class 2, and so forth. 

In social stratification, users in online ESNs will be mapped 
to their social classes according to mapping: c \ U ^ Tff. 
Eor each user u ^ lA, his social class c{u) is defined recur- 







sively as follows: 


c(u) 


1, if 2 / is the CEO; 

c{m{u))-\-l, otherwise. 


Definition 8 (Chart Depth Regulation Constraint): The 
chart depth regulation constraint avoids obtaining organiza¬ 
tional chart with too short command chains (e.g., the ex¬ 
treme horizontal structure) and can be represented as 


where m{u) represents the direct manager of u. 

In social science, the working class are eager to get ac¬ 
quainted with and join the upper echelons of their class by ei¬ 
ther accumulating wealth [^, imitating their dressing styles 
[^, and mimicking their dialect and accents [^. Mean¬ 
while, the upper class are very cohesive and they tend to be 
friends who share similar background . So is the case for 
the social links in enterprise social networks. By analyzing 
the Yammer network data, we observe that the probability 
for users to follow upper-level managers is 31.9% on aver¬ 
age, while that of following subordinates is merely 11.2%. 
As a result, in online ESNs, subordinates tend to follow 
their managers, while people in management are reluctant 
to initiate the friendship with their subordinates [^. Based 
on such an observation, we introduce the concept of class 
transcendence social links and propose to stratify users by 
minimizing the existence of such links in ESNs. 

Definition 6 (Class Transcendence Social Link): Link (u^ v) 
(i.e., u follows v) is defined as a class transcendence social 
link in online ESN G iff (u, u) G tS and c{u) < c{y) (where 
smaller social class denotes upper management level in the 
organizational chart). 

In social stratification, each introduced class transcen¬ 
dence social link in the result will lead to a class transcen¬ 
dence penalty. Let c{U) = {c(i^i), c(u 2 ), • • • ^c{u\u\)} be the 
social stratification result of all users in the ESN. Eor any 
directed social link (i^,u) G <S in the ESN, the class tran¬ 
scendence penalty introduced by it can be represented as 


y; c{u) >a-\u\, 

u^lA 

where parameter a is used to regulate the depth of the chart, 
whose sensitivity analysis will be given in Section Eur- 
thermore, term added to the minimization 

objective function to avoid obtaining charts with too long 
command chains (i.e., the extreme vertical structure). 

Based on all the above remarks, the optimal regulated 
social stratification c*{U) of users in ESN can be obtained 
by solving the following objective function: 


c ifX) — arg min 

{c(ni),c(n2),--- ,c(n|^|)} 


E p(c(w)>c(v)) +y;c(M), 

{u,v)ES uEU 


s.t., p{c{u),c{v)) > c{v) — c{u) + l,V(l^,u) G S, 
p{c{u),c{v)) > 0,\/{u,v) G <S, 
c{u) < c(u),Vi/, V eu, if |r(i^)| > |r(u)|. 


y; c(w) >a-\u\, 

u^lA 

c{u) = 1, u is the CEO, 

c(w) > l,c(w) €Z+,VMeiY\{CEO}, 

p{c{u),c{v)) € Z,V(m, ti) € S. 


The integer programming objective function can be solved 
with open source toolkits, e.g., GLPkQ PuLfQ etc., very 
easily and the obtained results of variables c(r^i), 0 ( 1 ^ 2 ), • • • , c(u\u\) 
represent the inferred social classes of users in online ESN. 


p{c{u),c{v)) = 


0 , 

c{v) - c{u) + 1, 


if c{u) > c{v) 
otherwise. 


The class transcendence penalty introduced by all social 
links (i.e., S) in the ESN can be represented as 


p{cm= E p{c{u),c{v)) 

{u,v)ES 

= max{c(u) — c(u) + 1, 0}. 

(u,v)ES 

“The rich get richer” (i.e., the Matthew Effect [^) is a 
common phenomenon in social science literally referring to 
issues of fame or status as well as cumulative advantage of 
economic capital. By analyzing the Yammer network data, 
we have similar observations: “people at higher management 
level can accumulate more followers easily”. Such an obser¬ 
vation provides important hints for inferring users’ relative 
management levels according to their in degrees in ESN (i.e., 
the number of followers). 

Definition 7 (Matthew Effect based Constraint): Eor any 
two given users u and v in the network, let r(u) and T{v) 
be the follower sets of u and v in the network respectively. 
The matthew effect based constraint on users u and v can be 
represented as c{u) < c{v) if |r(i^)| > |r(u)|. 

Eurthermore, to maximize the operation efficiency of com¬ 
panies, the inferred organizational chart needs to meet the 
macro-level depth requirement^ which can be achieved with 
the following chart depth regulation constraint. 


3.2 Supervision Link Inference with Social Meta 
Paths Aggregation 

It is a challenge to estimate the supervision relations be¬ 
tween the ESN members in consecutive social classes. Here 
the meta paths concept introduced in ^ 40 


to 


identify and evaluate different types relationship in ESN. 
Social Meta Paths in Enterprise Social Networks 


follow 


> User, whose notation is “U ^ U” 


Follow: User 
or4>i(U,U). 

Follower of Follower: User User 

whose notation is ^ U ^ U” or 4>2(U, U). 


follow 


■> User, 


Common Followee: User 
whose notation is “U ^ U 


follow 


> User 


follow 


User, 


U” or 4>3(U,U). 

follow~^ follow 


User 


> User, 


Common Follower: User 
whose notation is “U ^ U 

Common Croup Membership: User Group —> 


U” or 4>4(U,U). 


User, whose notation is “U -^G 


Reply Post: User Post Post 

whose notation is “U P ^ 


U” or 4>5(U,U). 

write 


> User, 


- U” or ^6{U,U). 

1 

User, whose nota- 


r -1 7-, / TT write . like 

• Like Post: User-Post - 

tion is “U ^ P ^ P ^ P” or 4>7(P, U). 

An existing user intimacy measure, Path-Sim, based 
on meta paths was introduced in [^, which can calculate 
the propagation probability between users via meta paths 


^ https:/ / WWW. gnu. or g/software / glpk / 
^https: / / code.google.com/p/pulp-or / 





















Figure 4: An example of K-to-one matching. 


in undirected homogeneous networks. To deal with directed 
heterogeneous networks, we extend it and introduce a new 
intimacy measure, DP-intimacy (Directed Path-Intimacy), 
based on social meta path ^i{U, U),i G {1, 2, • • • ,7}: 


DP-intimacy^ (i^, v) 


\vArni{u ^ i;)| + \vArni{v ^ u)\ 
\vArni{u ^ •)! + \vArni{v ^ 


where VAT'Hi{u ^ v) denotes the instance set of meta path 
^i{U, U) going from to r' in the ESN. 

Different social meta paths capture the intimacy between 
users in different aspects and overall intimacy between users 
can be obtained by aggregating information from all these 
social meta paths. Let DP-intimacy {u, v) , DP-intimacy 2 (u,v), 
• • • , DP-intimacy-^ (i^, v) be the intimacy scores between users 
u and V calculated based on social meta paths 4>i(f/, U), 

4>2(f/, f/), • • • , U) respectively. Without loss of gener¬ 

ality, we choose logistic function as the intimacy aggregation 
function , the overall intimacy between users u and v can 
be represented as 


intimacy{u, v) 


gE(i) u;^DP-intimacy.(n,^;) 

\ _l_ eE(i) a;^DP-intimacy^(n,^;) ^ 


where the value of uji denotes the weight of social meta path 
and = 1. 


3.3 Regulated Social Class Matching 

Meta path aggregation based supervision link inference 
method proposed in previous step calculate the intimacy 
scores of all the potential links between pairs of social classes. 
However, to regulate the supervision workload allocation, 
the number of subordinates each manager can supervise is 
limited by the management threshold K. In this section, 
we will prune the redundant non-existing supervision links 
with network-flow based regulated social class matching to 
preserve the K-to-one constraint on supervision links. 


3.3.1 Bipartite Preference Graph 

Based on the social stratification results c{U) = {c(ui), c{u 2 ), 
• • • , c{u\u\)}^ we can stratify all the users into social classes 
[1, max(c(Z//))] and users in class i G [1, max(c(Z//))] can be 
represented as set T(i) C U. By aggregating information 
in various social meta paths, we can calculate the intimacy 
scores of all potential supervision links between consecutive 
social classes, which exist between T(i) and T(i + 1) can 
be represented as set A{i,i + 1) = T(z) x T(z + 1). Links 


in A(i,i + 1) are associated with certain weights (i.e., the 
calculated intimacy scores), which can be obtained with the 
mapping n(z, z + 1) : A(z, z + 1) ^ R. 

Users in social classes z and z + 1 (i.e., T(z) and T(z + 
1)) together with all the potential supervision links between 
them (i.e., A(z,z+1)) and their intimacy scores (i.e., n(z,z-t- 
1)) can form a weighted bipartite preference graph. 
Definition 9 (Weighted Bipartite Preference Graph): The 
weighted bipartite preference graph between users in T(z) 
and T(z + 1) can be represented as B = (^(^) U T(z + 
l),A(z,z + l),n(z,z + l)). 

An example of weighted bipartite preference graph is shown 
in the upper plot of Figure In the example, all the poten¬ 
tial supervision links between the upper-level and lower-level 
individuals are represented as the directed purple lines be¬ 
tween them, whose weights are the numbers marked on the 
lines. Each employee in the figure is associated with multi¬ 
ple potential supervision links and the redundant ones can 
be pruned with the network flow method introduced in the 
next subsection. 

3.3.2 Minimum Cost Network Flow based Social Class 
Matching 

Based on the bipartite preference graph B, we propose to 
construct the following network flow graph first. 

Definition 10 (Network Flow Graph): Based on bipartite 
preference graph B = (^(^) U T(z + 1), A(z,z + l),n(z,z + 
1)), the network flow graph can be represented as iL = 
{Nh, Ch Node set Nh includes all nodes in B and 
two dummy nodes: source node s and sink node t (i.e., 
Nh = T(z) U T(z + 1) U {s,t}). Besides all the links in 
B, we further add directed links from s to all nodes in 
T(z), as well as those from all nodes in T(z + 1) to t (i.e., 
Ch = A(z, z + 1) U ({s} X T(z)) U (T(z + 1) x {t})). Only the 
links in A(z, z + 1) are associated with weights, which can be 
obtained with mapping Wh = n(z,^ + 1). 

For instance, based on the bipartite preference graph in the 
upper plot of Figure ^ we can construct its corresponding 
network flow graph (i.e., the lower plot). All the links in the 
network flow graph are directed denoting the flow direction. 
Bound Constraint of Network Flow 

For each link (zz, z;) G we allow a certain amount of 
flow going through within range [ lbu , v , upu , v ]^ where lbu,v 
and upu,v represent the lower bound and upper bound asso¬ 
ciated with link (zz, z;) respectively and 

lbu,v ^ Xu,v ^ ubujV) 

where Xu,v is the flow amount going through link (zz,z;). 

More specifically, for links from s to the upper level indi¬ 
viduals, i.e., {s} X T(z), we set its lower bound and upper 
bound to be Ibs^u = 0 and ubg^u = K respectively and we 
can get 

0 < Xs,u < K,\/u e ^(z)? 

where K is the management threshold, whose sensitivity 
analysis is available in Section It is actually the constrain 
to preserve micro-level width requirement. 

For link {v,t) G Ch, we set its lower bound and upper 
bound to be lbv,t = 1 and ubv,t = 1, i-e., 

Xv,t = 1, Vz; G T(z + 1), 

which means exact amount 1 flow goes through link (z;, t) 
(i.e., each subordinate needs to have exactly one manager). 





Links {u^v) G A(z,z + 1) C Ch have lower and upper 
bounds lbu,v — 0, ubu,v = 1 and the flow amount needs to 
be an integer, i.e., 

Xu,v G "[0,1}, 

denoting whether supervision links in A(z, z + 1) are selected 
or not in the matching result. 

Mass-Balance Constraint of Network Flow 

In network flow model, for each node, e.g., u, the amount 
of flow going into u should be equal to that going out from 
u, i.e., 

^ ^ Xw,u — ^ ^ Xu,v 

weMh ,(w,u)e£H ,(u,v)eCH 

Minimum Cost Network Flow 

All links going from ^(z) to ^(z + 1) are associated with 
corresponding flow costs, which are negatively correlated to 
their intimacy scores. For instance, in this paper, for link 
{u,v) E Cb with weight intimacy(u,v), we can represent 
their flow cost as 1 — intimacy {u, v). The optimal network 
flow with the minimum cost (i.e., the maximum intimacy) 
can be obtained by addressing the following integer program¬ 
ming problem: 


min Xu,v{^ — iTitimacy{u,v)) 

s.t. 0 < Xs,u < for Vzz G ^(0? 

Xv,t — 1, for \/v G T(z + 1), 

Xu,v G {0,1}, for Mu G T(z), Vz; G T(z + 1), 

Xw,u = ^ Xu,v,MueJVH- 

wEMh ,(w,u)E£,h vEMh ,(u,v)ECh 

Similarly, the above integer programming problem can be 
addressed with open source toolkits and how to solve the 
equation will not be introduced here. Variables obtained by 
solving the above equation can lead to the minimum cost 
but can also meet the constraints as well. These obtained 
variables denote the existence scores of the corresponding 
supervision links, where the selected links (i.e., those cor¬ 
responding variable x = 1) will be assigned with label +1 
while the rest are assigned with label —1. 

4. EXPERIMENTS 

To examine the performance of Create in addressing the 
IOC problem, in this part, extensive experiments will be 
conducted on real-world enterprise social network: Yammer. 

4.1 Dataset Description 

We crawl all the Microsoft employees’ information from 
Yammer and obtain the complete organizational chart in¬ 
volving all these employees in Microsoft during June, 2014. 
The social network data covers all the user-generated con¬ 
tent (such as posts, replies, topics, etc.) and social graphs 
(such as user-user following links, user-group memberships, 
user-topic following links, etc.) by then that are set to be 
public. In summary, it includes more than lOO/c Microsoft 
employees, and millions of user-generated posts published 
and the social links 0 

^We are not able to reveal the actual numbers here and 
throughout the paper for commercial reasons. 


All the users in yammer are registered with the official em¬ 
ployment ID in Microsoft, via which we can identify them 
in the organizational chart correspondingly. From Microsoft, 
the complete organization structure of all employees is ob¬ 
tained. As introduced before, the structure of the organiza¬ 
tional chart is a rooted tree with the CEO at the top. 

4.2 Experiment Settings 

The Create framework proposed in this paper is an un¬ 
supervised model, and the organizational chart is used for 
evaluation only in the experiments. To ensure the employee 
node set of organizational chart to be identical to that of 
ESN, a fully aligned Yammer network and organiza¬ 
tional chart are sampled from the dataset. Initially, with 
the directed follow links among users in Yammer, we achieve 
the regulated social stratification of users by minimizing the 
number class transcendence social links. All the potential 
supervision links between pairwise consecutive social classes 
in the social stratification result are inferred by aggregat¬ 
ing information from various social meta paths in Yammer, 
whose existence likelihood is denoted as the intimacy score. 

Eor simplicity, the weights of different social meta paths 
in logistic function are assigned with identical values, i.e., 
uj = - • ,y]. A subset of these inferred supervision links 

will be selected via the regulated social class matching based 
on the network-flow model to preserve the K-to-one con¬ 
straint on supervision links. 

Meanwhile, to demonstrate the effectiveness of Create, 
we compare Create with many baseline methods, including 
both state-of-art and traditional methods in social stratifi¬ 
cation and organizational chart inference. 

Social Stratification Methods: 

• Regulated Social Stratification: Regulated social strat¬ 
ification is the first step of Create proposed in this 
paper, which is also named as Create for simplicity. 
Create exploits the concept of class transcendence 
social links and Matthew Effect based constraint to 
stratify users in ESNs into different social classes. In 
addition, to regulate the depth of inferred social classes 
about employees. Create further adds a chart depth 
regulation constraint into the objective function. 

• Agony based Social Division: ASD is a state-of-art so¬ 
cial division method proposed in , which detects the 
social hierarchies of regular users in general online so¬ 
cial networks. ASD is not designed for organizational 
chart inference and doesn’t consider the matt hew ef¬ 
fect based constraint nor the chart depth regulation 
constraint. 

Organizational Chart Inference Methods: 

• Social Stratification -h Link Prediction P Matching (Create) 
Create is the framework proposed in this paper and it 

has three steps: (1) regulated social stratification, (2) 
link inference and (3) regulated social class matching. 

• Social Stratification + Link Prediction (Create-Sl): 
Create-SL contains two steps: (1) social stratifica¬ 
tion, and (2) link prediction based on accumulated so¬ 
cial meta paths. Create-SL has no matching step to 
keep the micro-level width requirement and the out¬ 
puts cannot meet the K-to-one constraint. 

• Social Stratification P Matching (Create-Sm): Create- 
SM contains two steps: (1) social stratification, and (2) 
social class matching. Create-SM has no supervision 




Figure 5: Sensitivity analysis of parameter a. 


link prediction step and social links from upper-level 
social class to the lower-level are regarded as the po¬ 
tential supervision links candidates. 

• Social Stratification (Create-s): Create-s is identi¬ 
cal to Create-sl except that it has no matching step 
and outputs the all the social links between sequential 
hierarchies as the supervision links. 

• Traditional Unsupervised Link Prediction Methods’. No 

existing supervised link prediction models can be ap¬ 
plied as no labeled supervision link exist. For com¬ 
pleteness, we further compare our with traditional un¬ 
supervised baseline methods which include Common 
Neighbor (CN) Jaccord’s Coefficient (JC) and 

Adamic Adar (A~A) between consecutive stratified 
social classes. 


Social Stratification Evaluation Metrics 

In the social stratification step, the outputs are the in¬ 
ferred social classes of all the employees. By comparing 
them with individuals’ real-world social classes (i.e., the 
grou nd truth), we can calculate the mean absolute error 
[^ , mean squared error and coefficient of determina¬ 
tion (i.e., R^) of the results. In addition, the ratio of 
correctly stratified users (i.e., accuracy) can also be used to 
measure the performance. So, the metrics used to evaluate 
the performance of different social stratihcation methods in¬ 
clude mean absolute error (MAE) [^, mean squared error 
(MSE) 27 , and accuracy. 

Organizational Chart Inference Evaluation Metrics 

Methods Create-SL, Create-l and the traditional un¬ 
supervised link prediction can only output the conhdence 
scores of all potential supervision links without labels, whose 
performance will be evaluated by metrics AUC and Preci- 
sion@100. Meanwhile, Create and Create-SM can output 
both labels and scores of potential supervision links and, 
besides AUC and Precision@100, we will also evaluate their 
performance with Precision, Recall and El-score. 


4.3 Social Stratification Results 

In social stratification, parameter a is applied to main¬ 
tain the macro-level depth requirement, which can control 
the depth of the organizational chart. Before comparing the 
performance of Create with ASD, we will analyze the sen¬ 
sitivity of parameter a at first. We select a with values 
in {1, 2,3,4, 5, 5.1, 5.3, 5.5, 5.7, 5.9,6, 7,8,9} and obtain the 
accuracy scores achieved by Create as shown in Eigurej^ 

When parameter a is very small (e.g., from 1 to 3), we 
observe that it has no effects on the performance of Cre¬ 
ate. The possible reason can be that Matthew Effect based 


constraint can already effectively outline the relative hierar¬ 
chical relationships among users in online ESNs, the aver¬ 
age social class of users obtained based on which is already 
greater than 3. When a becomes larger (from 4 to 6), the 
structure regulation constraint starts to matter more and the 
social stratification accuracy goes up steadily and achieve 
the highest value at 5.5, i.e., the default value of a in later 
experiments. Create performs better as a increases shows 
that the structure regulation constraint can stretch the orga¬ 
nizational structure and stratify users in their correct social 
classes. However, as a further increases (i.e., from 6 to 9), 
the accuracy achieved by Create decreases dramatically. 
The reason can be that larger a stretches the organizational 
structure too much and put lots users into the wrong so¬ 
cial classes. Eor example, it is nearly impossible for users 
to achieve 9 as the average social class, which is actually 
the largest social class in the sampled fully aligned organi¬ 
zational chart. 

Social stratification results of Create and ASD are given 
in Eigures [6][7| where Eigurej^ shows the results achieved by 
Create and ASD at each social class (evaluated by preci¬ 
sion and recall respectively) and Eigure[^ shows their overall 
performance (evaluated by Accuracy, MAE, MSE and R^). 

Erom the microscopic perspective, we observes that Cre¬ 
ate performs better than ASD consistently at all social 
classes. Create achieves both 1.0 precision and 1.0 recall 
at social classes 1,2, i.e.. Create identifies the top 2 man¬ 
agement levels of the company correctly. The performance 
of Create at other social classes is also very promising. 
Eor instance, the precision scores achieved by Create at 
social classes 3,4, 7,8 (besides 1,2) are either 1.0 or close 
to 1.0 and the recall scores of Create at social classes 5,6 
are also very high, which all outperform those of ASD with 
significant advantages. 

Erom a macroscopic perspective, the performance of Cre¬ 
ate in stratifying the whole user set in ESN is very excellent 
and much better than that of ASD. The Accuracy, MAE, 
MSE and R^ scores achieved by Create are 0.68, 0.43, 0.71 
and 0.37 respectively, which all outperforms those achieved 
by ASD. Eor example, the accuracy achieved Create is al¬ 
most the triple of that obtained by ASD, while the MAE 
and MSE obtained by Create are merely the 28% and 19% 
of those achieved by ASD. In addition, ASD gets negative 
R^ scores in identifying social classes of users in ESNs, which 
denotes that the identihed users’ social classes are massively 
disordered and have no linear correlation with the social 
class ground truth at all. 

4.4 Organizational Chart Inference Results 

Create has proved its excellent effectiveness in strati¬ 
fying users in ESNs, based on which, we further study its 
performance in inferring the potential supervision links be¬ 
tween pairs of consecutive social classes, whose results are 
evaluated by AUC, Precision@100 in Table as well as by 
Recall, Precision and El in Eigurej^ 

In Table we compare Create (of different manage¬ 
ment thresholds K) with all the other baseline methods, 
where Create (with parameter K = 15 and 20) performs 
the best. Compared with Create-sl (or Create-s), Cre¬ 
ate (or C REATE-Sm) which has the matching step can iden¬ 
tify supervision links more effectively. Eor instance. Cre¬ 
ate (with K = 15) outperforms Create-SL by over 20% 
in AUC and 6% in Precision@100, and Create-SM (with 
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Figure 6: Precision and Recall achieved by Create and ASD at each social class of the organizational chart. 



Figure 7: Performance comparison of Create and ASD evaluated by different metrics. 


Table 1: Performance comparison of different orga- 
nizational chart inference methods._ 


Method 


Metrics 

AUC 

Precision@100 

Create(K = 10) 

0.856 

0.830 

Create(K = 15) 

0.869 

0.870 

Create(K = 20) 

0.869 

0.870 

Create-SL 

0.719 

0.820 

Create-sm(K = 10) 

0.610 

0.720 

Create-sm(K = 15) 

0.630 

0.790 

Create-sm(K = 20) 

0.630 

0.790 

Create-s 

0.627 

0.740 

S-CN 

0.636 

0.440 

S-JC 

0.636 

0.260 

S-AA 

0.528 

0.070 


parameter A = 15) outperforms Create- s with remark¬ 
able advantages. It demonstrates that matching step can 
effectively prune non-existing supervision links and preserve 
the micro-level width requirement (i.e., the K-to-one con¬ 
straint). 

Compared with Create-sm and Create-s, Create which 
infers the potential supervision links based on heterogeneous 
information in ESNs instead of merely regarding the social 
links as supervision link candidates achieves much better 
results. For example, in Table the AUC of Create is 
38% higher than that of Create-SM and Create-s, while 


the Precision@100 of Create is also roughly 10% higher as 
well. In addition, in Figures the Recall, Precision and FI 
obtained by Create is almost triple of those achieved by 
Create-sm. It confirms the argument that heterogeneous 
information in ESNs can capture the relationships among 
colleagues (especially between managers and subordinates). 

In addition, we also compare Create with traditional un¬ 
supervised link prediction methods, including CN, JC and 
AA, and the advantages of Create are very obvious accord¬ 
ing to Table Create can outperform all these unsuper¬ 
vised link prediction methods with significant advantages. 

4.5 Management Threshold Sensitivity Anal¬ 
ysis 

In social class matching, the management threshold pa¬ 
rameter K plays a key role in constraining the number of 
supervision links connected to each managers. The sensitiv¬ 
ity of parameter K will be analyzed in this section, where the 
results achieved by Create (with different Ks) evaluated by 
different metrics are available in Figure Small manage¬ 
ment threshold K (e.g., 5) limits each manager’s subordinate 
number to 5 and will preserve the supervision links with ex¬ 
tremely high likelihood only but may miss many promising 
ones. However, as threshold K increases, more links with 
high likelihood will be preserved and the metric scores in¬ 
crease consistently. Meanwhile, when the threshold K goes 
to 25, the performance of Create degrades dramatically. 
The possible reason can be that, with larger threshold, each 
manager can have too many supervision links, which may 
exceed the subordinates they have in the real-world. 
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Figure 8: Performance comparison of Create and Create-sm evaluated by different metrics (K = 15). 





(a) Recall (b) Precision (c) FI 

Figure 9: Sensitivity analysis of parameter K. 


5. RELATED WORK 

Enterprise social networks are important sources for em¬ 
ployees in companies to get reliable information. Ehrlich 
et ah propose to search for experts in enterprise with 
both text and social network analysis techniques. They pro¬ 
pose to examine the users’ dynamic profile information and 
get the social distance to the expert before deciding how to 
initiate the contact. Enterprise social networks can lead to 
lots of benefits to companies and the motivations of enter¬ 
prise social network adoption in companies are studied in 
details in [^. Users in enterprise social networks will con¬ 
nect and learn from each other through personal and pro¬ 
fessional sharing. People sensemaking and relation building 
on an enterprise social network site is studied in by DiMicco 
et al. [^. In addition, social connections among users in 
enterp rise social networks usually have multiple facets. Wu 
et al. propose to study the study the multiplexity of so¬ 
cial connections among users in enterprise social networks, 
which include both professional and personal closeness. 

Erom social networks, some works have been done to infer 
the hierarchies of individuals 0[^[Tg[T^. A measure, 
agony, is proposed in [^, by minimizing which the authors 
propose a hierarchy detection method. A random graph 
model and markov chain monte carlo sampling is proposed 
by Clauset et al. in [R, which can address the problem 
of structural inference of hierarchies in networks. Maiya et 
al. propose to identify the hierarchies in social networks to 
achieve the maximum likelihood in . All the above three 
papers focus on dividing individuals into different hierarchies 
only. A offline hierarchical ties inference method has been 
proposed by Jaber et al. in to discover offline links 
among people based on a time-based model. However, none 


of these papers can recover the whole organizational chart. 

Cross-social-network studies has become a hot research 
topic in recent years. Kong et al. are the first to pro¬ 
pose the concepts of “anchor links”, “anchor users”, “’aligned 
networks” etc. A novel network anchoring method is pro¬ 
posed in to address the network alignment problem. 
Cross-network heterogeneous link prediction problems are 
studied by Zhang et al. 3^ ^ 1^ [M by transferring links 


across partially aligned networks. Besides link prediction 
problems, Jin et al. proposes to partition multiple large- 
scale social networks simultaneously in and Zhang et 
al. study the community detection problem across partially 
aligned networks in [37| |39] . Zhan et al. analyze the infor¬ 
mation diffusion process across aligned networks [34]. 


6. CONCLUSION 

In this paper, we have studied the organizational chart 
inference (IOC) problem based on the heterogeneous online 
ESNs. To address the IOC problem, a new chart inference 
framework Create has been proposed in Section Cre¬ 
ate consists of 3 steps: (1) regulated social stratification, (2) 
supervision link inference with social meta paths aggrega¬ 
tion, and (3) regulated social class matching. Experiments 
on real-world ESN and organizational chart dataset have 
demonstrated the effectiveness of Create. 
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